Search CORE

239 research outputs found

Frobenius norm regularization for the multivariate von Misses distribution

Author: Bielza Lozoya María Concepción
Larrañaga Múgica Pedro María
Rodríguez Luján Luis
Publication venue: 'Wiley'
Publication date: 01/01/2016
Field of study

Penalizing the model complexity is necessary to avoid overfittingwhen the number of data samples is low with respect to the number of model parameters. In this paper, we introduce a penalization term that places an independent prior distribution for each parameter of the multivariate von Mises distribution.We also propose a circular distance that can be used to estimate the Kullback–Leibler divergence between any two circular distributions as goodness-of-fit measure. We compare the resulting regularized von Mises models on synthetic data and real neuroanatomical data to show that the distribution fitted using the penalized estimator generally achieves better results than nonpenalized multivariate von Mises estimator

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Towards Gaussian Bayesian network fusion

Author: Bielza Lozoya María Concepción
Córdoba Sánchez Irene
Larrañaga Múgica Pedro María
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Data sets are growing in complexity thanks to the increasing facilities we have nowadays to both generate and store data. This poses many challenges to machine learning that are leading to the proposal of new methods and paradigms, in order to be able to deal with what is nowadays referred to as Big Data. In this paper we propose a method for the aggregation of different Bayesian network structures that have been learned from separate data sets, as a first step towards mining data sets that need to be partitioned in an horizontal way, i.e. with respect to the instances, in order to be processed. Considerations that should be taken into account when dealing with this situation are discussed. Scalable learning of Bayesian networks is slowly emerging, and our method constitutes one of the first insights into Gaussian Bayesian network aggregation from different sources. Tested on synthetic data it obtains good results that surpass those from individual learning. Future research will be focused on expanding the method and testing more diverse data sets

Archivo Digital UPM

Multi-facet determination for clustering with Bayesian networks

Author: Bielza Lozoya María Concepción
Larrañaga Múgica Pedro
Rodríguez-Sánchez Fernando
Publication venue: E.T.S. de Ingenieros Informáticos (UPM)
Publication date: 01/01/2017
Field of study

Real world applications of sectors like industry, healthcare or finance usually generate data of high complexity that can be interpreted from different viewpoints. When clustering this type of data, a single set of clusters may not suffice, hence the necessity of methods that generate multiple clusterings that represent different perspectives. In this paper, we present a novel multi-partition clustering method that returns several interesting and non-redundant solutions, where each of them is a data partition with an associated facet of data. Each of these facets represents a subset of the original attributes that is selected using our information-theoretic criterion UMRMR. Our approach is based on an optimization procedure that takes advantage of the Bayesian network factorization to provide high quality solutions in a fraction of the time

Archivo Digital UPM

Mining multi-dimensional concept-drifting data streams using Bayesian network classifiers

Author: Bielza Lozoya María Concepción
Borchani Hanen
Gama João
Larrañaga Múgica Pedro María
Publication venue: 'IOS Press'
Publication date: 01/01/2016
Field of study

In recent years, a plethora of approaches have been proposed to deal with the increasingly challenging task of mining concept-drifting data streams. However, most of these approaches can only be applied to uni-dimensional classification problems where each input instance has to be assigned to a single output class variable. The problem of mining multi-dimensional data streams, which includes multiple output class variables, is largely unexplored and only few streaming multi-dimensional approaches have been recently introduced. In this paper, we propose a novel adaptive method, named Locally Adaptive-MB-MBC (LA-MB-MBC), for mining streaming multi-dimensional data. To this end, we make use of multi-dimensional Bayesian network classifiers (MBCs) as models. Basically, LA-MB-MBC monitors the concept drift over time using the average log-likelihood score and the Page-Hinkley test. Then, if a concept drift is detected, LA-MB-MBC adapts the current MBC network locally around each changed node. An experimental study carried out using synthetic multi-dimensional data streams shows the merits of the proposed method in terms of concept drift detection as well as classification performance

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Learning tractable multidimensional Bayesian network classifiers

Author: Benjumeda Barquita Marco Alberto
Bielza Lozoya María Concepción
Larrañaga Múgica Pedro María
Publication venue: E.T.S. de Ingenieros Informáticos (UPM)
Publication date: 01/01/2016
Field of study

Multidimensional classification has become one of the most relevant topics in view of the many domains that require a vector of class values to be assigned to a vector of given features. The popularity of multidimensional Bayesian network classifiers has increased in the last few years due to their expressive power and the existence of methods for learning different families of these models. The problem with this approach is that the computational cost of using the learned models is usually high, especially if there are a lot of class variables. Class-bridge decomposability means that the multidimensional classification problem can be divided into multiple subproblems for these models. In this paper, we prove that class-bridge decomposability can also be used to guarantee the tractability of the models. We also propose a strategy for efficiently bounding their inference complexity, providing a simple learning method with an order-based search that obtains tractable multidimensional Bayesian network classifiers. Experimental results show that our approach is competitive with other methods in the state of the art and ensures the tractability of the learned models

Archivo Digital UPM

Data publications correlate with citation impact

Author: Bielza Lozoya María Concepción
Hill Sean L.
Larrañaga Múgica Pedro María
Leitner Florian
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2016
Field of study

Neuroscience and molecular biology have been generating large atasets over the past years that are reshaping how research is being conducted.In their wake, open data sharing has been singled out as a major challenge for the future of research. We conducted a comparative study of citations of data publications in both fields, showing that the average publication tagged with a data-related term by the NCBI MeSH(MedicalSubjectHeadings) curators achieves a significantly larger citation impact than the average in either field. We introduce a new metric, the data article citation index(DAC-index), to identify the most prolific authors among those data-related publications.The study is fully reproducible from an executable Rmd(RMarkdown)script to gether with all the citation datasets. We hope these results can encourage authors to more openly publish their data

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Anomaly detection with a spatio-temporal tracking of the laser spot

Author: Atienza González David
Bielza Lozoya María Concepción
Díaz Rozo Javier
Larrañaga Múgica Pedro María
Publication venue: 'IOS Press'
Publication date: 01/01/2016
Field of study

Anomaly detection is an important problem with many applications in industry. This paper introduces a new methodology for detecting anomalies in a real laser heating surface process recorded with a high-speed thermal camera (1000 fps, 32×32 pixels). The system is trained with non-anomalous data only (32 videos with 21500 frames). The proposed method is built upon kernel density estimation and is capable of detecting anomalies in time-series data. The classification should be completed in-process, that is, within the cycle time of the workpiece

Archivo Digital UPM

Decision functions for chain classifiers based on Bayesian networks for multi-label classification

Author: Bielza Lozoya María Concepción
Larrañaga Múgica Pedro María
Varando Gherardo
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Multi-label classification problems require each instance to be assigned a subset of a defined set of labels. This problem is equivalent to finding a multi-valued decision function that predicts a vector of binary classes. In this paper we study the decision boundaries of two widely used approaches for building multi-label classifiers, when Bayesian networkaugmented naive Bayes classifiers are used as base models: Binary relevance method and chain classifiers. In particular extending previous single-label results to multi-label chain classifiers, we find polynomial expressions for the multi-valued decision functions associated with these methods. We prove upper boundings on the expressive power of both methods and we prove that chain classifiers provide a more expressive model than the binary relevance method

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Dynamic Bayesian network-based anomaly detection for in-process visual inspection of laser surface heat treatment

Author: Bielza Lozoya María Concepción
Díaz Rozo Javier
Larrañaga Múgica Pedro María
Ogbechie Condes Alberto
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

We present the application of a cyber-physical system for inprocess quality control based on the visual inspection of a laser surface heat treatment process. To do this, we propose a classification framework that detects anomalies in recorded video sequences that have been preprocessed using a clustering-based method for feature subset selection. One peculiarity of the classification task is that there are no examples with errors, since major irregularities seldom occur in efficient industrial processes. Additionally, the parts to be processed are expensive so the sample size is small. The proposed framework uses anomaly detection, cross-validation and sampling techniques to deal with these issues. Regarding anomaly detection, dynamic Bayesian networks (DBNs) are used to represent the temporal characteristics of the normal process. Experiments are conducted with two diferent types of DBN structure learning algorithms, and classification performance is assessed on both anomalyfree examples and sequences with anomalies simulated by experts

Archivo Digital UPM

Directional naive Bayes classifiers

Author: Bielza Lozoya María Concepción
Larrañaga Múgica Pedro María
López-Cruz Pedro L.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Directional data are ubiquitous in science. These data have some special properties that rule out the use of classical statistics. Therefore, different distributions and statistics, such as the univariate von Mises and the multivariate von Mises–Fisher distributions, should be used to deal with this kind of information. We extend the naive Bayes classifier to the case where the conditional probability distributions of the predictive variables follow either of these distributions. We consider the simple scenario, where only directional predictive variables are used, and the hybrid case, where discrete, Gaussian and directional distributions are mixed. The classifier decision functions and their decision surfaces are studied at length. Artificial examples are used to illustrate the behavior of the classifiers. The proposed classifiers are then evaluated over eight datasets, showing competitive performances against other naive Bayes classifiers that use Gaussian distributions or discretization to manage directional data

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM